Context-dependent duration modelling for continuous speech recognition
نویسندگان
چکیده
This paper presents a pilot study of using contextdependent segmental duration for continuous speech recognition in a domain-speci c application. Di erent modelling strategies are proposed for function words and content words. Stress level, word position in utterance and phone position in word are identi ed to be the 3 most crucial factors a ecting segmental duration in this particular application. In addition, speaking rate normalization is applied to further reduce the duration variabilities. Experimental results show that the normalized duration models can help improving the rank of the correct sentence in the N-best hypotheses.
منابع مشابه
Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملContext-dependent word duration modelling for robust speech recognition
Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a wor...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملOn parameter filtering in continuous subword-unit-based speech recognition
Simple IIR or FIR filters have been widely used in isolated or connected word recognition tasks to filter the time sequence of speech spectral parameters, since, despite their simplicity, they significantly improve recognition performance. Those filters, when applied to continuous speech recognition, where phoneme-sized modelling units are used, induce spectral transition spreading and a cross-...
متن کاملTemporal constraints in viterbi alignment for speech recognition in noise
This paper addresses the problem of temporal constraints in the Viterbi algorithm using conditional transition probabilities. The results here presented suggest that in a speaker dependent small vocabulary task the statistical modelling of state durations is not relevant if the max and min state duration restrictions are imposed, and that truncated probability densities give better results than...
متن کامل